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Abstract 

In  this  paper  we  compare  several  algorithms  for  sparse  Gaussian 
elimination  with  column  interchanges.  The  algorithms  are  all  derived  from 
the  same  basic  elimination  scheme,  and  they  differ  mainly  in  implementation 
details.  We  examine  their  theoretical  behavior  and  compare  their  perfor- 
mances on  a  number  of  test  problems  with  that  of  a  high  quality  complete 
threshold  pivoting  code.  Our  conclusion  is  that  partial  pivoting  codes 
perform  quite  well  and  that  they  should  be  considered  for  sparse  problems 
whenever  pivoting  for  numerical  stability  is  required. 


Key  Words:  Sparse  Gaussian  Elimination,  Sparse  Linear  Systems,  Linear 
Equations,  Analysis  of  Algorithms,  Pivoting  Algorithms 


CR  Categories:  4.6,  5.14,  5.25 


1.  Introduction 

Let  A  be  an  NxN  nonsingular  matrix,  and  consider  the  system  of 
linear  equations 

A  x  =  b  (1.1) 

where  x  and  b  are  vectors  of  length  N.  If  N  is  small  or  A  is  dense  (i.e. 
most  entries  a.,  are  nonzero),  then  the  standard  algorithm  for  solving 
(1.1)  is  Gaussian  elimination  with  partial  pivoting  (cf.  Forsythe  and 
Moler  [6]).  This  algorithm  is  quite  well  understood,  from  both  the 
theoretical  and  practical  points  of  view,  and  it  produces  an  accurate 
solution  x  (provided  that  A  is  well  conditioned  and  the  equations  are 
well  scaled,  both  of  which  we  shall  assume  throughout). 

When  N  is  large  and  the  matrix  A  is  sparse  (i.e.  most  entries  a.  . 
are  zero),  the  situation  is  quite  different.  Now  there  are  two  problems 
to  contend  with:  the  accurate  computation  of  x,  and  the  exploitation  of 
the  large  number  of  zeroes  in  A  to  reduce  storage  and  computing  time. 
When  A  is  symmetric  and  positive  definite,  a  number  of  methods  may  be 
effective,  including  variants  of  Gaussian  elimination  and  iterative  methods 
such  as  conjugate  gradient.  However,  in  most  other  cases,  Gaussian 
elimination  with  pivoting  seems  to  be  the  best  method  currently  available. 

In  this  paper  we  examine  algorithms  for  sparse  Gaussian  elimination 
with  partial  pivoting.  In  Section  2  we  introduce  a  basic  high-level 
algorithm  and  point  out  several  problems  that  arise  in  practice.  In 
Section  3  we  discuss  several  possible  algorithm  refinements  which  solve 
the  practical  problems,  and  we  give  a  theoretical  comparison  of  them  based 
on  their  behavior  in  the  best,  average,  and  worst  cases.  Unfortunately, 
we  do  not  believe  that  our  theoretical  analysis  reflects  what  happens  in 


2 
practice,  so  Section  4  we  give  the  results  of  some  numerical  experiments 
with  our  algorithms  and  compare  them  with  one  of  the  better  existing  codes 
for  Gaussian  elimination  with  pivoting.  Finally,  we  conclude  in  Section  5 
by  suggesting  directions  for  future  development  based  on  the  results  in 
this  paper. 

2.  A  Basic  Algorithm 

In  this  section  we  examine  a  basic  algorithm  for  sparse  Gaussian 
elimination  with  partial  pivoting.  More  precisely,  we  consider  an  algorithm 
which  performs  Gaussian  elimination  with  column  interchanges  on  the  matrix 
A  to  effectively  obtain  a  factorization  of  the  form 

A  Q  =  L  U,  (2.1) 

where  L  is  lower  triangular,  U  is  unit  upper  triangular,  and  Q  is  a 
permutation  matrix  corresponding  to  the  column  interchanges.  Once  this 
factorization  has  been  obtained,  we  can  compute  x  by  solving  the  systems 

L  y  =  b  and  U  QT  x  =  y, 
although  we  will  not  discuss  this  forward  and  back  solution  process  here. 
The  algorithm  we  give  is  a  modification  of  an  algorithm  without  pivoting 
discussed  in  [5]. 


line 

1  For  k  =  1  to  N  do 

2  (Ik  =  {J:  akj*0>; 

3  m  =  min  {j:  j  e  Ik  U  {N  +  1}}; 

4  While  k  >  m  do 

5  <!k  "  h  "  {m>  u  '•» 

6  For  j  £  I  do 

m 

7  (akj  =  akj  "  akm'  amj); 

8  m  =  min  {j:  j  e  Ik  U  {N  +  1}}); 

9  If  m  $  N  then  do 

10  U  =  min  {j:  |a,.|  =  max  (|ak-|:  k  *  i  *  NH; 

11  If  a.  =  0  then  exit  with  failure  (zero  pivot); 

12  Interchange  columns  k  and  l\ 

13  If  m  =  k  then  I.  =  I.  -  {k}; 

14  Else  Ik  =  Ik  -  {£}; 

15  For  j  e  I.  do 

16  (akj  =  akj/akk)); 

17      Else  exit  with  failure  (zero  pivot)); 

Algorithm  2.1 

The  basic  algorithm,  Algorithm  2.1,  is  a  row-oriented  version  of 
Gaussian  elimination  with  column  interchanges.  It  consists  of  N  steps, 
during  each  of  which  one  row  of  A  is  processed.  When  processing  the  k-th 
row  at  the  k-th  step,  I.  is  used  to  hold  the  indices  of  all  columns  con- 
taining nonzeroes  in  the  k-th  row.  Initially  (line  2),  these  are  just 
the  nonzero  columns  of  A  in  the  k-th  row.  Then,  in  increasing  order,  for 


each  m  e  I.  ,  m  <  k,  a  multiple  of  row  m  of  U  is  subtracted  from  row  k  to 
annihilate  a.   (lines  3  -  8).  This  may  cause  fill-in,  i.e.,  the  introduction 
of  new  nonzero  elements  in  the  k-th  row,  so  I.  must  be  updated  (line  5). 
Finally,  when  all  m  <  k  have  been  processed,  I.  contains  the  indices  of 
columns  which  contain  nonzeroes  in  the  portion  of  the  k-th  row  lying  in 
the  upper  triangle  of  U.  If  A  is  nonsingular,  I.  will  not  be  empty,  so 
that  m  s  N  will  hold  in  line  9.  In  that  case  the  algorithm  selects  the 
remaining  nonzero  element  with  maximum  modulus  and  interchanges  its  column 
with  the  k-th  column.  If  A  is  singular,  on  the  other  hand,  then  either 
m  =  N  +  1  in  line  9,  or  a.   =  0  in  line  11.  In  either  case,  the  algorithm 
takes  a  failure  exit.  (However,  this  situation  could  simply  be  handled  by 
storing  a  null  k-th  row,  if  one  wished  to  deal  with  singular  matrices.) 

That  Algorithm  2.1  is  numerically  stable  can  be  shown  quite  easily 
by  relating  it  to  the  application  of  standard  Gaussian  elimination  with 
row  interchanges  to  A  .  In  fact,  assume  that  such  a  procedure  produces  a 


factorization  of  A  as 


P  AT  =  L  U,  (2.2) 


where  L  is  unit  lower  triangular,  U  is  upper  triangular,  and  P  is  a  permu- 
tation matrix  corresponding  to  the  row  interchanges.  Then  we  can  show  that 
Algorithm  2.1  produces  the  factorization  (2.1)  of  A  with  L  =  U  ,  U  =  L  , 
and  Q  =  P  .  Since  the  computation  of  (2.2)  is  numerically  stable  (cf.  [6]), 
that  of  (2.1)  by  Algorithm  2.1  is  also. 

In  Algorithm  2.1  there  are  only  two  operations  whose  costs  can  be 
greatly  affected  by  the  column  interchanges  due  to  pivoting:  the  selection 
of  m  in  lines  3  and  8,  and  the  updating  of  I.  in  line  5.  The  costs  of 
these  operations  may  depend  critically  on  the  representation  of  the  sets 


I,  and  I  (e.g.  ordered  vs.  unordered,  list  vs.  array),  while  the  costs  of 

other  operations  in  the  algorithm  in  general  do  not. 

As  an  illustration,  consider  two  possible  representations  of  I. : 

as  a  linked  list  in  increasing  order  and  as  a  heap  with  smaller  nodes 

nearer  the  root  (cf.  Knuth  [7],  p.  145).  In  the  first  case,  selecting  m 

is  easy,  since  we  want  to  choose  the  first  entry  of  I.  ,  and  the  updating 

operation  in  line  5  can  be  performed  as  a  linear  merge  of  the  two  ordered 

lists  in  0(1 1   +  II.  I)  time.  (We  use  II   and  II.  I  to  denote  the  number 
v i  mi   ik1        v      '  m1     '  k1 

of  entries  in  II  I  and  ILL  respectively,  when  we  start  to  merge  I  into 
1  m '     '  k1  m 

I.  .)  In  the  second  case,  selecting  m  is  still  easy,  since  we  want  to  choose 
the  root  of  the  heap,  but  now  the  updating  operation  may  be  more  costly, 
since  we  must  delete  m  and  separately  insert  each  entry  of  I  into  the  heap 

for  I..  This  may  require  a  total  of  0(1 1  I  log0(|I  I  +  II.  I))  time. 

k        J       ^  '  m1   32  '  m1   '  k1 

From  this  example  it  might  seem  that  there  is  no  question  as  to  the 
proper  course:  simply  use  an  ordered  linked  list  representation  for  I,  (and 
an  ordered  array  representation  for  previously  computed  sets  I  ).  In  fact, 
if  there  were  no  pivoting,  this  conclusion  would  probably  be  correct. 
However,  as  soon  as  column  interchanges  can  occur,  the  situation  is  not  so 
clear-cut.  The  problem  is  that  while  I  was  in  increasing  order  when  it 
was  computed  at  the  m-th  step,  subsequent  pivoting  operations  may  cause  it 
to  be  out  of  order  at  the  k-th  step.  Thus  the  set  union  operation  can  no 
longer  be  performed  as  a  simple  ordered  list  merge.  We  might  try  to  avoid 
this  problem  by  keeping  the  sets  I.  in  some  canonical  order  (say  the  initial 
column  order  of  A),  but  then  the  selection  of  m  would  become  difficult, 
since  it  would  require  a  (necessarily  linear)  search  for  the  minimum 
remaining  column  index  in  I.  (in  the  current  column  order)  at  the  k-th  step. 
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In  the  remainder  of  this  paper,  we  will  use  Algorithm  2.1  as  the 
basis  for  our  discussion  of  various  alternative  means  of  implementing  the 
selection  and  updating  operations  which  seem  so  problematical  when  pivoting 
is  permitted.  The  next  section  contains  descriptions  of  several  possible 
implementations  and  a  limited  theoretical  discussion  of  them.  In  Section 
4,  we  will  present  more  meaningful  comparisons,  based  on  numerical 
experiments . 

3.  Algorithm  Implementations 

In  this  section  we  present  six  different  ways  of  implementing  the 

selection  and  updating  operations  of  Algorithm  2.1.  For  each  we  will 

briefly  describe  the  representations  used  for  I,  and  I  (i.e.  for  I,  during 

k     m         k 

the  k-th  step  and  for  the  previously  computed  sets  I  ).  Then  we  will 

outline  the  methods  used  to  select  m  and  to  form  the  union  I,  U  I  ,  giving 

k   m 

rough  estimates  of  the  costs  of  these  operations  in  the  best  case  (no 
pivoting),  the  average  case,  and  the  worst  case.  By  average  case  here, 
we  mean  the  average  case  with  respect  to  the  N!  possible  reorderings  of 
the  columns  of  A,  not  with  respect  to  all  possible  matrices  A.  While  the 
former  may  be  less  interesting,  the  latter  seems  all  but  impossible  to 
study  except  through  some  (probably  unrealistic)  testing  program  using 
randomly  generated  matrices  A. 

1)  Run  Insertion 

Each  set  I.  is  kept  as  a  linked  list  in  increasing  order  with 
respect  to  the  current  column  order  at  the  k-th  step,  and  the  sets  I  are 
kept  as  arrays  in  increasing  order  with  respect  to  the  column  order  at  the 
m-th  step,  where  they  were  computed.  The  selection  of  m  requires  constant 


time,  since  m  is  always   the  first  remaining  entry  of  I.  .     To  form  I.U  I    , 

K  K. 

we  effectively  split  I  into  its  component  increasing  runs  (cf.  Knuth  [7], 

p.  34)  and  separately  merge  each  of  the  runs  into  I.  .  In  practice,  this 

is  done  by  merging  the  entries  of  I  into  I.  and  backing  up  to  the  beginning 

m      k 

of  I,  whenever  an  out-of-order  entry  is  found  in  I  .  In  the  best  case,  the 

k  J  m 

union  operation  requires  0(|I    |   +   1 1 .  | )   time,  and  in  both  the  average  and 
worst  cases  it  requires  0 ( | I J    •    | I.  | )   time  (cf.   Knuth  [7],  pp.   34-45,  for 
a  discussion  of  the  average  number  of  runs  in  a  sequence). 

2)  List  Merge  with  Bubble  Sort 

The  sets  I.    and  I     are  kept  as  in  (1),  except  that  I     is  reordered 
k  m  v  \   i*  r  m 

whenever  it  is  used.     As  before,  choosing  m  requires  constant  time.     To 

form  I.    U  I    ,  we  first  sort  I     with  a  bubble  sort  (cf.  Knuth  [7],  pp. 
km  m  rr 

106-110)  and  then  perform  an  ordered  list  merge.     In  the  best  case,  this 

will   require  0  ( 1 1    |   +   |lj)   time,  while  in  the  average  and  worst  cases  it 

i      1 2 
will   require  0(   I         +     I,    )   time. 
1   m1  '    k ' 

3)  List  Merge  with  Insertion  Sort 

This  is  identical  with   (2),  except  that  I     is  sorted  with  a  straight 
insertion  sort  (cf.   Knuth  [7],  pp.   80-82)  when  it  is  used.     As  above,  m 
can  be  chosen  in  constant  time,  and  the  set  union  operation  requires 
0(1 1    I   +   1 1,  |)   time  in  the  best  case  and  0(|I    |     +   |l,|)   time  in  the 
average  and  worst  cases. 

4)  List  Merge  with  Shell   Sort 

This  is  also  identical  with   (2),  except  that  I     is  sorted  with  a 

v       r       m 

Shell  sort  (cf.  Knuth  [7],  pp.  84-95)  using  increments  as  suggested  in 


Knuth  [7],  p.  95.  Again  m  can  be  chosen  in  constant  time,  and  the  set 

union  operation  requires  0(1 1  I  +  ll,  I)  time  in  the  worst  case.  In  the 

1  m'    '  k1 

3/2 
average  case,  however,  only  0(|I  |    +  |I.|)  time  is  required  for 


m 


m' 


5)     Heap  Insertion 

The  set  I.    is  split  into  two  parts.     II.    =  {m  e   I.  :     m  <  k}  and 

12.    =  {m  e  I.  :     m  >  k}.     II.    is  kept  as  a  heap  with  smaller  nodes  nearer 

the  root  (cf.  Knuth  [7],  p.   145),  and  12.    is  kept  as  an  unordered  array. 

In  addition  the  characteristic  vector  of  I.    (cf.  Aho,  Hopcroft,  and  Ullman 

[1],  p.  49)   is  kept  to  avoid  storing  duplicate  entries.     At  the  end  of  the 

k-th  step,   II.    is  empty,  and  I.    =  12.  .     Hence  we  assume  that  the  sets  I 

are  stored  as  unordered  arrays.     (In  fact,  there  is  no  loss  in  not  having 

the  sets  I     ordered.)     Choosing  m  in  this  case  still   requires  constant 

time,  since  we  want  to  choose  the  root  of  the  heap  for  II.  .     To  form 

I,    U  I    ,  we  separately  add  each  entry  of  I     either  to  the  heap  for  II,    or 
km  m  k 

to  the  end  of  the  unordered  array  for  12.  .     To  insert  an  entry  into  the 

heap  requires  0(1 og2   | II .  | )   time,  and  to  add  an  entry  to  the  unordered 

array  requires  constant  time.     Thus  we  expect  that  the  best  case  time  for 

the  set  union  operation  will  be  0(1 1    I)  time  (when  all   the  entries  of  I 

are  larger  than  k),  and  that  the  average  and  worst  case  times  will  be 

0(1 1    I   log0  (|I    I  +   1 1,  I)).     Note  that  deleting  m  from  I,    requires 
lml32'ml        '   k1 "  k       ^ 

0(log2    1 1.  | )  time,  a  usually  insignificant  cost. 


6)     Pivot  Search  (cf.  Curtis  and  Reid  [3]) 
The  set  I.    is  kept  as  a  linked  lis 
to  a  canonical   ordering  (e.g.,  the  initial   column  order  of  A),  and  the  sets 


The  set  I.    is  kept  as  a  linked  list  in  increasing  order  with  respect 


I  are  kept  as  ordered  arrays  in  the  canonical  order.  Now  choosing  m 
requires  searching  I.  for  the  minimum  remaining  column  number  in  current 
order,  and  since  this  must  be  done  as  a  linear  search,  it  will  require 
0(| I.  |)  time  on  average.  In  return  for  this  searching,  we  can  again  form 
I.  U  I  in  0(  1 1  |  +  1 1.  | )  time  independent  of  the  amount  of  pivoting, 

since  both  I  and  I,  are  in  increasing  order  with  respect  to  the  canonical 

m     k  3  K 

order. 

4.  Numerical  Results 

In  this  section  we  give  numerical  results  obtained  by  programming 
the  five  implementations  of  Section  3  and  running  the  programs  on  several 
test  problems.  The  results  here  are  not  comprehensive,  but  the  problems 
are  realistic  ones.  The  problem  GS192  arose  in  the  solution  of  a  parabolic 
partial  differential  equation  and  was  supplied  by  Professor  Paul  Saylor 
of  the  University  of  Illinois.  The  others  were  taken  from  a  collection 
of  test  problems  supplied  by  Dr.  Iain  Duff  of  the  Atomic  Energy  Research 
Establishment  at  Harwell,  England.  Brief  descriptions  of  the  problems  are 
given  in  Table  4.1.  (See  Duff  and  Reid  [4]  for  further  discussion  of  the 
Harwell  test  problems.) 
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Problem 

N 

Application 

Remarks 

ARC130 

130 

Laser 
Research 

SHLO 

663 

Linear 
Programming 

Lower  triangular 

SHL200 

663 

Linear 
Programming 

SHL400 

663 

Linear 
Programming 

STRO 

363 

Linear 
Programming 

Lower  triangular 

STR200 

363 

Linear 
Programming 

STR400 

363 

Linear 
Programming 

GS192 

192 

Parabolic 
Partial 
Differential 
Equation 

Banded  with  half 
bandwidth  of  19 

TABLE  4.1 
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Experiments  have  shown  that  the  storage  and  time  costs  of  the 
algorithms  discussed  here  may  depend  heavily  on  the  initial  row  and  column 
orders  of  A.  Except  for  GS192,  we  chose  to  order  the  rows  in  order  of 
increasing  number  of  nonzero  entries  and  to  order  the  columns  symmetrically. 
For  GS192,  we  used  the  matrix  exactly  as  supplied  to  us.  If  the  permutation 
matrix  P  accomplishes  the  desired  row  reordering,  then  we  applied  the 
algorithms  to  the  permuted  system 

P  A  PT(P  x)  =  P  b. 
There  is  little  theoretical  reason  to  expect  that  the  choice  of  P  would 
have  a  significant  effect  on  the  accuracy  of  the  computed  solution  x,  except 
that  reducing  the  number  of  arithmetic  operations  performed  should  give  a 
corresponding  reduction  in  the  roundoff  error  incurred.  In  practice, 
however,  we  found  that  particularly  inefficient  orderings  of  the  rows  led 
to  poor  accuracy,  while  orderings  which  were  at  least  reasonably  efficient 
gave  uniformly  good  results.  Unfortunately,  at  present  we  can  say  no  more 
than  this,  but  we  hope  to  investigate  further  the  problem  of  choosing  the 
initial  row  and  column  orderings.   (Note,  however,  that  the  simple  row 
reorderings  described  above  gave  good  numerical  results  for  our  test 
problems. ) 

To  provide  a  benchmark  comparison  for  our  algorithms,  we  solved 
the  test  problems  with  a  code  due  to  A.  R.  Curtis  and  J.  K.  Reid  [2]  which 
is  part  of  the  Harwell  Subroutine  Library.  This  code  uses  a  rather  more 
sophisticated  complete  pivoting  algorithm  to  factor  A  and  compute  x.  Where 
our  algorithms  pre-order  A  to  exploit  sparseness  and  then  pivot  entirely 
for  numerical  stability,  the  Harwell  code  uses  a  strategy  known  as  threshold 
pivoting  to  do  both  at  once. 
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In  threshold  pivoting  the  pivot  element  at  the  k-th  step  is  chosen 
to  be  that  entry  a..,  k  <  i,  j  <  N,  which  requires  the  fewest  arithmetic 
operations  to  eliminate  and  which  satisfies  either 

|a..|  >,   u-rnax  {|a.  |:  k  $  i   $  N}  or 

| a .  .  |  >*   u«max  { |a  .  |  :  k  $  i   $  N}  , 

where  u  is  a  threshold  parameter  satisfying  0  <  u  $  1.  For  our  experiments 
we  used  two  different  settings  for  u:  u  =  1.0E-6,  to  allow  pivoting  almost 
entirely  to  exploit  sparseness,  and  u  =  1.0,  to  force  pivoting  almost 
entirely  for  numerical  stability.  Note  that  the  form  of  the  threshold  test 
allows  for  some  flexibility  in  either  case.  For  intermediate  values  of  u, 
we  would  expect  the  code  to  exhibit  a  behavior  somewhere  between  the  two 
extremes  we  used. 

Our  experiments  were  run  in  single  precision  on  an  IBM  System/360 
Model  75  using  the  Fortran  IV  Level  G  compiler.  All  of  the  programs  gave 
equally  accurate  results  as  measured  by  the  l_2  and  L^  norms  of  both  the 
absolute  error  in  the  solution  and  the  residual,  except  that  the  Harwell 
code  with  u  =  1.0E-6  was  sometimes  less  accurate  than  the  others.  Hence 
the  important  question  in  our  tests  was  efficiency,  and  we  chose  to  use 
two  measures:  the  number  of  nonzeroes  in  the  computed  LU  factorization 
(as  some  measure  of  storage  cost),  and  the  time  required  to  compute  the 
solution  x  from  the  original  (not  yet  pre-ordered)  system  (1.1).  We  should 
note  that  probably  all  of  the  codes  could  be  made  somewhat  faster  by  more 
careful  coding  and  that  the  times  may  be  in  error  by  approximately  10%  due 
to  system  overhead.  However,  the  results  are  still  qualitatively 
interesting. 
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The  results  shown  in  Table  4.2  show  several  things.  Most  obvious, 
the  partial  pivoting  codes  are  invariably  much  faster  than  the  Harwell 
code,  even  though  they  produce  a  denser  factorization  (and  hence  perform 
more  arithmetic  operations).  Next,  it  appears  that  the  Shell  Sort,  Heap 
Insertion,  and  Pivot  Search  algorithms  performed  worse  than  the  other 
partial  pivoting  algorithms  on  most  of  the  problems.  In  the  first  case 
this  is  probably  due  to  the  nature  of  the  data  which  must  be  sorted;  in 
the  second  case,  this  is  probably  due  to  data  structure  overhead;  and  in 
the  last  case  this  is  probably  due  to  the  fact  that  the  costly  searches 
occurred  in  the  filled-in  rows  at  each  step.  Finally,  it  seems  that  the 
four  remaining  partial  pivoting  algorithms  all  performed  comparably  well, 
with  the  Run  Insertion  algorithm  appearing  best  over  all,  perhaps  because 
it  can  be  implemented  with  little  programming  overhead. 


14 


.c 

LO 

■»-> 

a 

r^ 

O 

CO 

I— 

r-» 

■ — 

1— 

CO 

CO 

CO 

«3- 

t— 

CO 

LO 

O 

CT> 

o 

s- 

LO 

r^ 

CO 

1 — 

CO 

CM 

LO 

r— 

LO 

r-^ 

LO 

CO 

CO 

CO 

00 

• 

> 

rO 

<3- 

• 

LO 

• 

CO 

• 

CO 

• 

cti 

• 

CM 

• 

n— 

• 

CO 

r-^ 

•1 — 

QJ 

t— 

o 

CVJ 

CM 

CM 

CM 

CM 

CM 

CM 

O 

r^ 

<d- 

CO 

^J- 

CO 

r— 

Q. 

CO 

C 

o 

•r- 

r-. 

O- 

+J 

r*» 

00 

CO 

cti 

r^ 

O 

^— 

"St 

CO 

O 

^1- 

00 

CO 

CTi 

0 

LO 

ro 

S_ 

LO 

LO 

CO 

LO 

CO 

LO 

LO 

CO 

LO 

CTl 

LO 

*d- 

CO 

CO 

CO 

• 

CU 

ai 

*3- 

• 

LO 

• 

CO 

• 

CO 

• 

CTl 

• 

CM 

• 

CM 

• 

CO 

0 

DC 

10 

E 

1—4 

o 

CM 

«3- 

CM 

^ 

CM 

»* 

CM 

1^ 

CO 

00 

CO 

CO 

, 

CT> 

1— 

+J 

r^ 

«d- 

CO 

o 

r*. 

co 

^— 

r^ 

CO 

CO 

^j" 

t— 

CO 

«3- 

0 

r-^ 

CU 

s- 

in 

CO 

CO 

r— 

CO 

CM 

LO 

t— 

LO 

«3- 

LO 

co 

CO 

r— 

00 

• 

-C 

o 

<3- 

• 

LO 

• 

CO 

• 

CO 

• 

<x> 

• 

CM 

• 

CM 

• 

CO 

LO 

CO 

CO 

1 

o 

CM 

1 

CM 

1 

CM 

r_ 

CM 

0 

r-. 

*3- 

00 

LO 

CO 

CO 

c 
o 

•r- 

r^ 

4-> 

+-> 

r-» 

o 

CO 

CO 

r»- 

CO 

r— 

«3- 

CO 

0 

■«i- 

00 

CO 

CM 

0 

CT> 

S- 

s- 

LO 

CO 

CO 

^~- 

CO 

r— 

LO 

0 

LO 

LO 

LO 

«d" 

CO 

CO 

CO 

• 

QJ 

o 

«=* 

• 

LO 

• 

CO 

• 

CO 

• 

CTl 

• 

CM 

• 

CM 

• 

CO 

CTt 

(A 

C 
t— i 

CO 

o 

CM 

CM 

CM 

CM 

O 

r^ 

CO 

CO 

CO 

CO 

cu 

,_ 

r— 

+-> 

r^. 

CO 

CO 

en 

r^ 

CM 

r— 

00 

CO 

CO 

^l- 

CM 

CO 

LO 

0 

«d" 

-O 

S- 

LO 

LO 

CO 

CTl 

CO 

r— 

LO 

O 

LO 

«3- 

LO 

CO 

CO 

«d- 

00 

■ 

-O. 

o 

<* 

• 

LO 

• 

CO 

•  ' 

CO 

• 

CTt 

• 

CM 

• 

CM 

• 

CO 

<^- 

• 

3 

CO 

r— 

o 

CM 

o 

CM 

^— 

CM 

^~ 

CM 

0 

r-. 

CO 

CO 

CO 

CO 

1— 

1= 

CM 

CD 

0 

**- 

c 

=3 

LU 

o 

r— 

_l 

•1 — 

CM 

O 

CO 

C 

+J 

r^ 

00 

00 

CO 

r>^ 

CO 

r— 

CTi 

CO 

1— 

*d- 

** 

CO 

CO 

0 

CM 

to 

< 

3 

s_ 

LO 

•3- 

CO 

CO 

CO 

r— 

LO 

r— 

LO 

CO 

LO 

0 

CO 

«d- 

00 

• 

1— 

cc 

CU 

«* 

• 

LO 

• 

CO 

• 

CO 

• 

CTl 

• 

CM 

• 

CM 

• 

00 

O 

CU 

tO 

r— 

o 

CM 

1 — 

CM 

r— 

CM 

r— 

CM 

0 

r^ 

CM 

CO 

CM 

CO 

^-* 

-M 

c 

ro 

t—t 

O 

co 

O 

1— 

1 

* 

(O 

r— 

o 

* 

CM 

<u 

r^ 

ro 

CO 

1^ 

en 

CO 

r— 

CM 

ct» 

«3" 

tT» 

CO 

CM 

LO 

00 

CO 

«^" 

c 

2 

CTl 

CO 

CO 

o 

CM 

CO 

f— 

0 

LO 

*d- 

r— 

CM 

CO 

LO 

«* 

• 

(O 

S- 

II 

CM 

• 

CO 

• 

1^ 

• 

r-» 

• 

*J- 

• 

LO 

• 

r--. 

• 

r-» 

CM 

ro 

r— 

r^ 

^~ 

r^ 

n— 

co 

T— 

r-. 

CM 

>^- 

CO 

CO 

co 

CT> 

CO 

00 

cu 

a: 

3 

3 
Q. 

r— 

o 

E 

^— 

• 

1 — 

CM 

O 

a> 

r— 

CO 

CT> 

r^ 

o 

CO 

r^ 

CM 

CO 

^1- 

CO 

O 

r— 

CTi 

cr> 

CO 

CTl 

O 

s 

CT> 

00 

CO 

en 

CM 

*3- 

r— 

CO 

LO 

CO 

«^" 

a> 

r^. 

• 

CO 

• 

c 

II 

CM 

• 

CO 

• 

r-. 

• 

r^. 

• 

«d- 

• 

CO 

• 

CO 

0 

LO 

O 

O 

ro 

1 — 

CO 

i^ 

LO 

^~ 

CO 

^~ 

CO 

CM 

CO 

CO 

00 

CO 

r— 

CO 

CO 

4-> 

31 

3 

-O 

CU 

• 

• 

• 

• 

• 

• 

• 

• 

^~ 

co 

CO 

to 

to 

to 

to 

to 

to 

*r— 

f-4 

(J 

rvl 

o 

r-vi 

0 

M 

0 

M 

0 

M 

CJ 

M 

<J 

IVI 

0 

rO 

z 

CD 
CO 

2: 

CO 

2: 

CU 
CO 

21 

cu 

CO 

z: 

cu 

CO 

2: 

ai 

CO 

Z 

CU 
CO 

2: 

cu 

CO 

<4- 

0 

"O 

.c 

o 

-M 

.c 

/E 

o 

O 

O 

c 

) 

c 

3 

flj 

■•-> 

/     OJ 

CO 

0 

O 

c 

> 

c 

3 

CN 

J 

E 

*     . 

/         ^~- 

I— 

C 

D 

CM 

«3" 

c 

3 

CN 

j 

*j 

; 

0 

^ 

s:  y 

.C 

C_3 

J 

—1 

_l 

a 

* 

a 

a 

cu 

o 

Cd 

5 

:r 

:n 

h 

1- 

l- 

(/ 

) 

-C 

J- 

<X. 

u 

7 

CO 

CO 

u 

■) 

c 

> 

c/ 

) 

c 

J 

r- 

Q_ 

■»c 

15 
5.  Conclusion 

There  seem  to  be  several  directions  for  future  research  in  this 
area.  On  the  theoretical  side,  there  is  a  particularly  troubling  problem 
which  may  be  stated  as  follows.  Let  C(N)  be  the  cost  (in  time)  of  solving 
a  sparse  system  (1.1)  without  pivoting.   (Assume  for  a  moment  that  this 
is  numerically  stable.)  It  would  be  desirable  for  a  pivoting  algorithm 
to  have  a  cost  which  was  0(N«C(N))  in  the  worst  case  and  which  declined 
to  C(N)  as  the  amount  of  pivoting  decreased.  Unfortunately,  none  of  the 
algorithms  discussed  here  has  both  properties,  and,  in  fact,  it  is  unclear 
whether  any  algorithm  could  have  them. 

In  a  more  practical  vein,  it  is  desirable  to  continue  the  testing 
and  comparison  program  begin  in  the  work  reported  in  this  paper.  One 
problem  of  importance  is  the  selection  of  the  initial  row  ordering. 
Another  question  which  might  be  investigated  is  whether  there  is  some 
threshold  technique  which  might  be  applied  to  partial  pivoting  as  discussed 
here.  Since  no  information  is  kept  about  the  column  structures  of  A,  L, 
and  U,  the  strategy  used  in  the  Harwell  code  is  not  suitable,  but  it  might 
be  possible  to  use  a  threshold  parameter  to  decide  whether  or  not  to  pivot 
at  all  in  a  certain  row.  Reducing  the  amount  of  pivoting  in  this  way 
might  make  the  algorithms  run  faster  without  significantly  affecting 
accuracy. 

Finally,  we  must  turn  to  the  eventual  goal  of  work  such  as  ours, 
the  production  of  useful  mathematical  software.  It  is  true  that  the 
behavior  of  a  partial  pivoting  code  depends  quite  heavily  on  the  specific 
problem  being  solved,  in  particular  on  the  amount  of  pivoting  required 
and  on  the  initial  ordering  of  the  rows  and  columns  of  A.  Hence  our 
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sample  of  problems  may  provide  a  biased  and  incorrect  view  of  the  true 
situation.  However,  we  believe  that  our  results  do  provide  enough 
encouragement  to  justify  the  development  and  extensive  testing  of  high 
quality  subroutines  based  on  one  or  more  of  our  algorithms. 
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